Multi-Object Tracking

Decode-MOT: How Can We Hurdle Frames to Go Beyond Tracking-by-Detection?

Decode-MOT Decision Coordinator: A novel module that adaptively chooses between tracking-by-detection (TBD) and tracking-by-motion (TBM) at each frame, boosting speed without much accuracy loss.
Contextual Learning Framework:
- • Scene Context Learning via attention-based comparison of convolutional features across frames.
- • Tracking Context Learning based on motion and object count (cardinality) similarity.
Self-Supervised Learning Approach: A strategy to train the decision coordinator without ground truth, using pseudo labels derived from contextual similarities between TBD and TBM results.
Hierarchical Confidence Association: A multi-stage track-detection association strategy that leverages track/detection confidence to reduce association ambiguity progressively.

Fig 1. Accuracy and speed of the recent methods on the MOTChallenge dataset.

Proposed Model Architecture:

Fig 2. The overall architecture of our Decode-MOT. It consists of (a) a decision coordinator of predicting the probability of TBM, (b) a scene context representation module of evaluating
the long-term attention between different frames, and (c) a hierarchical association of linking between detections and tracks progressively

Proposed Contextual Learning:

Performance Results

COMPARISON AMONG OUR DECODE-MOT, THE BASELINE WITH THE HIERARCHICAL ASSOCIATION, AND THE BASELINE TRACKER WITH DIFFERENT TDRS ON MOT15 DATASET.
THE PERCENTAGE IN [·] SHOWS THE SPEED GAIN AND ACCURACY REDUCTION RATES OF EACH TRACKER AS TDR DECREASES

Multi-Object Tracking

Decode-MOT: How Can We Hurdle Frames to Go Beyond Tracking-by-Detection?

Proposed Model Architecture:

Proposed Contextual Learning:

Performance Results

Comparison with SOTA Methods